32 research outputs found

    A Review of Methods for the Analysis of the Expected Value of Information

    Get PDF
    Over recent years Value of Information analysis has become more widespread in health-economic evaluations, specifically as a tool to perform Probabilistic Sensitivity Analysis. This is largely due to methodological advancements allowing for the fast computation of a typical summary known as the Expected Value of Partial Perfect Information (EVPPI). A recent review discussed some estimations method for calculating the EVPPI but as the research has been active over the intervening years this review does not discuss some key estimation methods. Therefore, this paper presents a comprehensive review of these new methods. We begin by providing the technical details of these computation methods. We then present a case study in order to compare the estimation performance of these new methods. We conclude that the most recent development based on non-parametric regression offers the best method for calculating the EVPPI efficiently. This means that the EVPPI can now be used practically in health economic evaluations, especially as all the methods are developed in parallel with

    A Bayesian partial identification approach to inferring the prevalence of accounting misconduct

    Get PDF
    This paper describes the use of flexible Bayesian regression models for estimating a partially identified probability function. Our approach permits efficient sensitivity analysis concerning the posterior impact of priors on the partially identified component of the regression model. The new methodology is illustrated on an important problem where only partially observed data is available - inferring the prevalence of accounting misconduct among publicly traded U.S. businesses

    Counterfactual Learning with Multioutput Deep Kernels

    Get PDF
    In this paper, we address the challenge of performing counterfactual inference with observational data via Bayesian nonparametric regression adjustment, with a focus on high-dimensional settings featuring multiple actions and multiple correlated outcomes. We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently thanks to their sample efficiency gains, while scaling well with high dimensions. In the first part of the work, we rely on Structural Causal Models (SCM) to formally introduce the setup and the problem of identifying counterfactual quantities under observed confounding. We then discuss the benefits of tackling the task of causal effects estimation via stacked coregionalized Gaussian Processes and Deep Kernels. Finally, we demonstrate the use of the proposed methods on simulated experiments that span individual causal effects estimation, off-policy evaluation and optimization

    Estimating the Expected Value of Sample Information across Different Sample Sizes using Moment Matching and Non-Linear Regression

    Get PDF
    Background: The Expected Value of Sample Information (EVSI) determines the economic value of any future study with a specific design aimed at reducing uncertainty in a health economic model. This has potential as a tool for trial design; the cost and value of different designs could be compared to find the trial with the greatest net benefit. However, despite recent developments, EVSI analysis can be slow especially when optimising over a large number of different designs. Methods: This paper develops a method to reduce the computation time required to calculate the EVSI across different sample sizes. Our method extends the moment matching approach to EVSI estimation to optimise over different sample sizes for the underlying trial with a similar computational cost to a single EVSI estimate. This extension calculates posterior variances across the alternative sample sizes and then uses Bayesian non-linear regression to calculate the EVSI. Results: A health economic model developed to assess the cost-effectiveness of interventions for chronic pain demonstrates that this EVSI calculation method is fast and accurate for realistic models. This example also highlights how different trial designs can be compared using the EVSI. Conclusion: The proposed estimation method is fast and accurate when calculating the EVSI across different sample sizes. This will allow researchers to realise the potential of using the EVSI to determine an economically optimal trial design for reducing uncertainty in health economic models. Limitations: Our method relies on some additional simulation, which can be expensive in models with very large computational cost

    Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability

    Get PDF
    Understanding the shopping motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain.Comment: 20 pages, 9 figure

    Interpretable Deep Causal Learning for Moderation Effects

    Get PDF
    In this extended abstract paper, we address the problem of interpretability and targeted regularization in causal machine learning models. In particular, we focus on the problem of estimating individual causal/treatment effects under observed confounders, which can be controlled for and moderate the effect of the treatment on the outcome of interest. Black-box ML models adjusted for the causal setting perform generally well in this task, but they lack interpretable output identifying the main drivers of treatment heterogeneity and their functional relationship. We propose a novel deep counterfactual learning architecture for estimating individual treatment effects that can simultaneously: i) convey targeted regularization on, and produce quantify uncertainty around the quantity of interest (i.e., the Conditional Average Treatment Effect); ii) disentangle baseline prognostic and moderating effects of the covariates and output interpretable score functions describing their relationship with the outcome. Finally, we demonstrate the use of the method via a simple simulated experiment

    Mixture polarization in inter-rater agreement analysis: a Bayesian nonparametric index

    Full text link
    In several observational contexts where different raters evaluate a set of items, it is common to assume that all raters draw their scores from the same underlying distribution. However, a plenty of scientific works have evidenced the relevance of individual variability in different type of rating tasks. To address this issue the intra-class correlation coefficient (ICC) has been used as a measure of variability among raters within the Hierarchical Linear Models approach. A common distributional assumption in this setting is to specify hierarchical effects as independent and identically distributed from a normal with the mean parameter fixed to zero and unknown variance. The present work aims to overcome this strong assumption in the inter-rater agreement estimation by placing a Dirichlet Process Mixture over the hierarchical effects' prior distribution. A new nonparametric index λ\lambda is proposed to quantify raters polarization in presence of group heterogeneity. The model is applied on a set of simulated experiments and real world data. Possible future directions are discussed

    Posterior summaries of grocery retail topic models: Evaluation, interpretability and credibility

    Get PDF
    Understanding the shopping motivations behind market baskets has significant commercial value for the grocery retail industry. The analysis of shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while delivering interpretable outcomes. Latent Dirichlet allocation (LDA) allows processing grocery transactions and the discovering of customer behaviours. Interpretations of topic models typically exploit individual samples overlooking the uncertainty of single topics. Moreover, training LDA multiple times show topics with large uncertainty, that is, topics (dis)appear in some but not all posterior samples, concurring with various authors in the field. In response, we introduce a clustering methodology that post-processes posterior LDA draws to summarise topic distributions represented as recurrent topics. Our approach identifies clusters of topics that belong to different samples and provides associated measures of uncertainty for each group. Our proposed methodology allows the identification of an unconstrained number of customer behaviours presented as recurrent topics. We also establish a more holistic framework for model evaluation, which assesses topic models based not only on their predictive likelihood but also on quality aspects such as coherence and distinctiveness of single topics and credibility of a set of topics. Using the outcomes of a tailored survey, we set thresholds that aid in interpreting quality aspects in grocery retail data. We demonstrate that selecting recurrent topics not only improves predictive likelihood but also outperforms interpretability and credibility. We illustrate our methods with an example from a large British supermarket chain

    Regional Topics in British Grocery Retail Transactions

    Get PDF
    Understanding the customer behaviours behind transactional data has high commercial value in the grocery retail industry. Customers generate millions of transactions every day, choosing and buying products to satisfy specific shopping needs. Product availability may vary geographically due to local demand and local supply, thus driving the importance of analysing transactions within their corresponding store and regional context. Topic models provide a powerful tool in the analysis of transactional data, identifying topics that display frequently-bought-together products and summarising transactions as mixtures of topics. We use the Segmented Topic Model (STM) to capture customer behaviours that are nested within stores. STM not only provides topics and transaction summaries but also topical summaries at the store level that can be used to identify regional topics. We summarised the posterior distribution of STM by post-processing multiple posterior samples and selecting semantic modes represented as recurrent topics. We use linear Gaussian process regression to model topic prevalence across British territory while accounting for spatial autocorrelation. We implement our methods on a dataset of transactional data from a major UK grocery retailer and demonstrate that shopping behaviours may vary regionally and nearby stores tend to exhibit similar regional demand
    corecore